The IBM trainable speech synthesis system
نویسندگان
چکیده
The speech synthesis system described in this paper uses a set of speaker-dependent decision-tree state-clustered hidden Markov models to automatically generate a leaf level segmentation of a large single-speaker continuous-read-speech database. During synthesis, the phone sequence to be synthesised is converted to an acoustic leaf sequence by descending the HMM decision trees. Duration, energy and pitch values are predicted using separate trainable models. To determine the segment sequence to concatenate, a dynamic programming (d.p.) search is performed over all the waveform segments aligned to each leaf in training. The d.p. attempts to ensure that the selected segments join each other spectrally, and have durations, energies and pitches such that the amount of degradation introduced by the subsequent use of TD-PSOLA is minimised. Algorithms embedded within the d.p. can alter the required acoustic leaf sequence, duration and energy values to ensure high quality synthetic speech. The selected segments are concatenated and modi ed to have the required prosodic values using the TD-PSOLA algorithm. The d.p. results in the system e ectively selecting variable length units, based upon its leaf level framework.
منابع مشابه
A component by component listening test analysis of the IBM trainable speech synthesis system
This paper reports on a listening test conducted to determine the impact on speech quality of each component in the IBM Trainable Speech Synthesiser. The study was originally conceived to direct future research effort to those components with the greatest potential for improvement. However, the results and conclusions regarding prosodic modification, concatenation unit length, and decision tree...
متن کاملReducing the footprint of the IBM trainable speech synthesis system
This paper presents a novel approach for concatenative speech synthesis. This approach enables reduction of the dataset size of a concatenative text-to-speech system, namely the IBM trainable speech synthesis system, by more than an order of magnitude. A spectral acoustic feature based speech representation is used for computing a cost function during segment selection as well as for speech gen...
متن کاملCurrent status of the IBM Trainable Speech Synthesis System
This paper describes the current status of the IBM Trainable Speech Synthesis System. The system is a state-of-the-art, trainable, unit-selection based concatenative speech synthesiser. The system uses hidden Markov models (HMMs) to provide a phonetic transcription and HMM state alignment of a database of single-speaker continuous-speech training data. The runtime synthesiser uses the HMM state...
متن کاملData-driven Segment Pres Trainable Speech Syn
Unit selection based concatenative speech synthesis has proven to be a successful method of producing high quality speech output. However, in order to produce high quality speech, large speech databases are required. For some applications, this is not practical due to the complexity of the database search process and the storage requirements of such databases. In this paper, we propose a data-d...
متن کاملReducing the Footprint of the Ibm Train
This paper presents a novel approach for concatenative speech synthesis. This approach enables reduction of the dataset size of a concatenative text-to-speech system, namely the IBM trainable speech synthesis system, by more than an order of magnitude. A spectral acoustic feature based speech representation is used for computing a cost function during segment selection as well as for speech gen...
متن کاملPhrase splicing and variable substitution using the IBM trainable speech synthesis system
This paper describes a phrase splicing and variable substitution system which offers an intermediate form of automated speechproduction lying in-between the extremes of recorded utterance playback and full Text-to-Speech synthesis. The system incorporates a trainable speech synthesiser and an application specific set of pre-recorded phrases. The text to be synthesised is converted to a phone se...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1998